MotifVoter: a novel ensemble method for fine-grained integration of generic motif finders
نویسندگان
چکیده
MOTIVATION Locating transcription factor binding sites (motifs) is a key step in understanding gene regulation. Based on Tompa's benchmark study, the performance of current de novo motif finders is far from satisfactory (with sensitivity <or=0.222 and precision <or=0.307). The same study also shows that no motif finder performs consistently well over all datasets. Hence, it is not clear which finder one should use for a given dataset. To address this issue, a class of algorithms called ensemble methods have been proposed. Though the existing ensemble methods overall perform better than stand-alone motif finders, the improvement gained is not substantial. Our study reveals that these methods do not fully exploit the information obtained from the results of individual finders, resulting in minor improvement in sensitivity and poor precision. RESULTS In this article, we identify several key observations on how to utilize the results from individual finders and design a novel ensemble method, MotifVoter, to predict the motifs and binding sites. Evaluations on 186 datasets show that MotifVoter can locate more than 95% of the binding sites found by its component motif finders. In terms of sensitivity and precision, MotifVoter outperforms stand-alone motif finders and ensemble methods significantly on Tompa's benchmark, Escherichia coli, and ChIP-Chip datasets. MotifVoter is available online via a web server with several biologist-friendly features.
منابع مشابه
MProfiler: A Profile-Based Method for DNA Motif Discovery
Motif Finding is one of the important tasks in gene regulation which is essential in understanding biological cell functions. Based on Tompa et al. study, the performance of current motif finders is not satisfactory. A number of ensemble methods has been proposed to enhance the results. Existing ensemble methods overall performance is better than stand-alone motif finders. A recent ensemble met...
متن کاملDiscovery of transcription factor binding sites through integration of generic motif finders
Locating transcription factor binding sites is a key step in understanding gene regulation. Due to its importance, many de novo motif finding methods (e.g. MEME, MotifSampler, Mitra and Weeder) have been proposed. Individually, these motif finders perform unimpressively overall based on Tompa’s benchmark datasets. Moreover, these motif finders vary in their definitions of what constitute a moti...
متن کاملSCOPE: a web server for practical de novo motif discovery
SCOPE is a novel parameter-free method for the de novo identification of potential regulatory motifs in sets of coordinately regulated genes. The SCOPE algorithm combines the output of three component algorithms, each designed to identify a particular class of motifs. Using an ensemble learning approach, SCOPE identifies the best candidate motifs from its component algorithms. In tests on exper...
متن کاملFine-Grained Photovoltaic Output Prediction Using a Bayesian Ensemble
Local and distributed power generation is increasingly reliant on renewable power sources, e.g., solar (photovoltaic or PV) and wind energy. The integration of such sources into the power grid is challenging, however, due to their variable and intermittent energy output. To effectively use them on a large scale, it is essential to be able to predict power generation at a finegrained level. We d...
متن کاملCE3 Customizable and Easily Extensible Ensemble Tool for Motif Discovery
Ensemble methods (or simply ensembles) for motif discovery represent a relatively new approach to improve the accuracy of standalone motif finders. In particular, the accuracy of an ensemble is determined by the included finders and the strategy (learning rule) used to combine the results returned by the latter, making these choices crucial for the ensemble success. In this research we propose ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 24 20 شماره
صفحات -
تاریخ انتشار 2008